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ABSTRACT 

PELE, Protein Energy Landscape Exploration, our 
novel technology based on protein structure predic- 
tion algorithms and a Monte Carlo sampling, is 
capable of modelling the all-atom protein-ligand dy- 
namical interactions in an efficient and fast manner, 
with two orders of magnitude reduced computa- 
tional cost when compared with traditional molecu- 
lar dynamics techniques. PELE's heuristic approach 
generates trial moves based on protein and ligand 
perturbations followed by side chain sampling 
and global/local minimization. The collection of 
accepted steps forms a stochastic trajectory. 
Furthermore, several processors may be run in 
parallel towards a collective goal or defining 
several independent trajectories; the whole proced- 
ure has been parallelized using the Message 
Passing Interface. Here, we introduce the PELE 
web server, designed to make the whole process 
of running simulations easier and more practical 
by minimizing input file demand, providing user- 
friendly interface and producing abstract outputs 
(e.g. interactive graphs and tables). The web server 
has been implemented in C++ using Wt (http://www. 
webtoolkit.eu) and MySQL (http://www.mysql.com). 
The PELE web server, accessible at http://pele.bsc. 
es, is free and open to all users with no login 
requirement. 

INTRODUCTION 

With modern advances in molecular target therapies, it is 
becoming essential to elucidate the atomic mechanism for 
protein-ligand dynamical recognition. Achieving such 
detailed information requires not only the description of 
the ligand's induced fit, but also of its migration. Ligand 
binding, for example, can be impeded by mutations in the 
entrance pathway/channel. These challenges demand 



techniques capable of an atomic dynamical study with 
microsecond time scale resolution. Computer simula- 
tions offer such comprehensive information, although 
traditionally limited to sampling a fraction of the con- 
formational space. 

A large amount of work has been devoted to develop 
docking software (and servers) capable of modelling static 
protein-ligand complexes (1-5). Achieving a dynamical 
view of the process is considerably more challenging. 
Within the last 5 years, however, there has been a great 
effort in developing specialized software and hardware to 
perform microsecond molecular dynamics (MD) and 
reveal the biophysics behind molecular associations (6- 
9). Unfortunately, these remarkable developments still 
require a significant computational cost: days/weeks of 
hundreds of processors (on expensive hardware units). 
To circumvent this problem, we have designed PELE, an 
acronym for Protein Energy Landscape Exploration, 
which is a combination of Monte Carlo stochastic 
approach with protein structure prediction methods (9). 
PELE has shown to provide accurate ligand-induced fit 
(10,11) and migration results (12,13), and to reproduce 
the conformational sampling in microsecond MD 
trajectories with two orders of magnitude reduced compu- 
tational cost (14). Using PELE with an inexpensive 
multicore machine (16-32 processors), for example, we 
can map the unbiased ligand migration from the solvent 
to the active site in an overnight period of time (10). 
Access to this technology will be a big advantage for 
many scientists working in different areas. 

Briefly, PELE's heuristic approach generates trial 
moves based on protein and ligand perturbations 
followed by side chain sampling and global (or local) mini- 
mization. Protein perturbation is based on a combination 
of normal modes obtained in an anisotropic network 
model (ANM) (11), where a constrained minimization is 
used with a force placed on each alpha carbon (or all 
heavy atoms). Normal mode analysis techniques are cur- 
rently widely used, and web servers exist which are capable 
of quickly compute the modes (12,13). Ligand perturb- 
ation includes a displacement plus sampling of its 
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dihedrals (and dihedrals of those side chains in direct 
contact). After a final minimization using the OPLS- 
2005 force field and a generalized Born implicit solvent 
(where explicit water can be included), the trial move is 
accepted or rejected based on a Metropolis criterion. The 
collection of accepted steps forms a stochastic trajectory. 
The whole procedure also has been parallelized using the 
Message Passing Interface (MPI) to take advantage of 
High Performance Computing (HPC) facilities wherever 
available. 

In this communication, the PELE web server is pre- 
sented, which provides a free, accessible and user- 
friendly interface for using PELE from everywhere with 
the Internet access. To the best of our knowledge, there 
are no other web servers that can do ligand-induced fit and 
migration (entrance and escape) studies in a timely 
manner achievable by our web server today. 



MATERIALS AND METHODS 

PELE methodology 

The PELE algorithm combines Monte Carlo moves, 
protein ANM perturbations, rotamer library side-chain 
optimizations, truncated Newton minimizations and 
Metropolis acceptance tests. As illustrated in Figure 1, 
PELE's heuristic algorithm involves consecutive iteration 
of three main steps: localized perturbation, side chain 
sampling and minimization. The main steps are described 
below in a nutshell; the detailed method and its initial tests 
have been published elsewhere (14—16). 




Figure 1. Flowchart of the PELE methodology illustrating the three 
iterative main steps. 



Localized perturbation 

After an energy calculation for the initial structure, the 
procedure begins with the generation of a perturbation in 
the system. In studies of ligand diffusion or ligand-induced 
fit, the perturbation starts with a random translation and 
rotation of the ligand. Some ligands can be treated as rigid 
bodies; hence, only three rotational and three translational 
degrees of freedom are required. Flexible ligands, on the 
other hand, cannot be adequately described as a single rigid 
unit. The perturbation in this case includes additional 
degrees of freedom from the dihedral angles of rotatable 
bonds. Next, a series of steric filters are applied to deter- 
mine whether there is any contact between the ligand and 
the backbone of the protein. If any such contacts are found, 
the perturbation is rejected. Using this method, hundreds of 
perturbations are generated within seconds, and the one 
with the best energy is selected. 

In addition to the ligand, the perturbation might include 
the backbone of the protein (or the backbone surrounding 
the ligand) by performing a minimization, where the alpha- 
carbons are driven to a new position derived from a small 
displacement in a low frequency mode (the lowest eigenvec- 
tors) from an ANM approach, a simple model for normal 
mode analysis (11). This is accomplished by a quick mini- 
mization with a harmonic constraint on each displaced 
alpha-carbon. The magnitude of the displacement, the 
nature of the eigenvector and the strength of the constraint 
are user-adjustable variables. Such a procedure aims to in- 
corporate the global motion of the protein. 

Side chain sampling 

The algorithm continues by optimizing all side chains 
local to the ligand using a rotamer library (17,18) at 
rotamer resolution of 10, 20 or 30 degrees. The sampling 
algorithm uses steric filtering and clustering to reduce the 
number of rotamers to be minimized. For the protein, the 
energy of all side chains is computed before and after each 
ANM perturbation. In this way, at the side chain sampling 
step, the most excited side chains, i.e. those having a larger 
increase in energies as a consequence of the ANM move, 
can be optimized. 

Minimization 

Finally, the last step involves the minimization of a user- 
defined region, including at least all residues local to the 
atoms involved in the two previous steps, using the 
truncated Newton minimizer, the OPLS-2005 force field 
and a surface generalized Born (SGB) implicit solvent. 
The minimization is intended to locate a local minimum 
after the initial perturbation and side chain rearrange- 
ments introduced in the previous steps. 

These three steps devise a move, which is accepted, 
defining a new minimum, or rejected on the basis of a 
Metropolis criterion for a given temperature: 

AV< 0 

— a r 

e K B T < R 

That means by a decrease of the potential surface, 
A V < 0, or by satisfying the second criterion, where K B 
is the Boltzmann constant, T the temperature chosen for 
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the simulation and R is a random number with a [0, 1] 
range. The result of this procedure is a series of local 
minima forming a stochastic trajectory. Optionally, the 
algorithm can be biased to attain different goals by ad- 
justing different reaction coordinates where geometry or 
energy parameters are prioritized. For example, the 
motion of a ligand can be constrained within a distance 
from the active site, etc. Furthermore, the procedure has 
been parallelized using the MPI communications protocol, 
with the option of interchanging coordinates between dif- 
ferent trajectories. Whenever any trajectory is significantly 
further along a given reaction coordinate(s) than any of 
the other trajectories, the trailing trajectory is abandoned 
and restarted from the position of the leading trajectory. 
This allows an efficient sampling of the configurational 
space towards one defined objective: entrance/escape of 
the ligand, optimization of the protein-ligand binding 
energy, etc. 

PELE benchmarks 

Although this article aims to introduce the PELE web 
server, we will give here a short description of previous 
benchmark studies. Several applications have described 
exit and entry pathways, comparing its accuracy with ex- 
perimental data. In our first study (14), we obtained the 
migration for Cytochrome P450 camphor, myoglobin and 
the fatty acid-binding proteins. Further studies compared 
PELE's migration pathways in truncated haemoglobin 
with kinetic experimental data for the wild-type and the 
W8F mutant (19). More recently, a comprehensive study 
on 97 difficult induced fit cases, including cross- and apo- 
docking, has shown the capabilities of PELE in docking 
refinement (15). Furthermore, the study underlined the 
goodness of the OPLS interaction energy when scoring 
docking poses within one ligand. Binding site search and 
induced fit docking has been performed, in collaboration 
with experimental studies, in aryl-alcohol oxidase (20,21), 
in mTOR kinases (22), in anti-apoptotic Bcl-2 receptors 
(23) and in different globins (10). The mTOR and Bcl-2 
studies illustrate the combination of PELE with docking 
scoring techniques to discriminate different ligands. At the 
level of protein dynamics, in absence of ligand, we have 
compared PELE's conformational search with microsec- 
ond MD simulations in ubiquitin and with metadynamics 
calculations on T4 lysozyme (16). The results clearly show 
a good agreement in the sampling of PELE with that of 
more sophisticated simulations. 

DESCRIPTION OF THE WEB SERVER 

Input 

Detailed information about how to submit new jobs and 
format of the input can be found in the Help section of the 
web server. Here, only the most important items are 
described. 

The only mandatory input file that must be provided by 
the user is the structures, i.e. protein(s) and possibly 
ligand(s), waters and ions with all hydrogen atoms 
added, in the PDB format (http://www.wwpdb.org/docs. 
html). Then a control script can be made based on the 



selected ready-made script or uploaded by the user. The 
PELE program reads all the parameters for a simulation 
run as directives written in a control script with .con as its 
file extension. There are five optimized ready-made scripts, 
each with a detailed description, to perform routine tasks: 
(i) protein local motion; (ii) normal mode exploration; (hi) 
ligand binding refinement; (iv) unconstrained ligand ex- 
ploration and binding site search; and (v) driving the 
ligand. By selecting any of these scripts, only few param- 
eters should be entered (default values are also provided). 
Additionally, advanced users can upload a custom-made 
script instead of those mentioned above or edit the 
generated script before submission. Hovering the pointer 
over each parameter shows a descriptive text about its 
purpose. For scripts requiring a ligand perturbation, 
chain ID and residue number of the ligand in the PDB 
file must also be specified. Multiple ligands can be 
included in the input file, although only one can be per- 
turbed at the present time. 

There are some prerequisites for the PDB input file, (i) 
The server does not add any hydrogen atoms or modify 
any protonation state, i.e. the user should know the 
system and decide the best protonation state for both 
the protein and the ligand. The PDB2PQR (24) and 
MolProbity (25) servers can be used to add hydrogen 
atoms to proteins in an intelligent way by optimizing the 
hydrogen-bond network, (ii) To recognize the right 
protonation state, the user should change the name of 
some residues in the PDB file as follows: HIS/HIE/HIP 
for delta, epsilon and protonated histidine, respectively. 
LYS->LYN, ARG->ARN for neutral lysine and 
glutamic, respectively; ASP->ASH, GLU->GLH for 
protonated aspartic and glutamic, respectively; CYS-> 
CYT, CYS->CYX for deprotonated negative and 
neutral cysteine, respectively, (hi) The protein structure 
must be complete, without any missing residues or 
atoms. Again, the PDB2PQR server can be used for 
adding missing side-chain atoms (24). (iv) Water residues 
(if present) should be named SPC or HOH and defined as 
HETATM records. 

There is a possibility to upload a sample PDB file for 
each ready-made script as test runs, optimized for getting 
typical results achievable by each script. By checking the 
contents of these PDB files and the corresponding 
generated control files, users may gain more insights 
about how to prepare their own PDB and control files. 

It is possible to assign more CPUs or higher wall clock 
for each submitted job in the Job Settings panel. The 
default values are 4 CPUs and 24 h, respectively, with 16 
CPUs and 72 h as the upper limits. By tuning these values, 
it is possible to reduce the pending time of a job in case of 
heavy load of compute cluster (see Implementation section 
below). As an option, user can also select additional 
metrics (e.g. RMSD, surface area, interatomic distances, 
etc.) to be appeared in the output file for each step. 

Output and representation of results 

On successful submission of a job, user automatically is 
redirected to a result page for monitoring the progress of 
the computations and viewing the results in real time. This 
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page can be bookmarked for a later correspondence and 
will remain valid for 10 days after initial submission. A 
typical simulation run takes only few hours (depending on 
the number of steps and CPUs selected for the job). PELE 
is capable of reproducing, in ~100h of CPU time, the 
ligand escape and entry pathways, and in ~10h of CPU 
time, the active site induced fit. 

The first part of the results is an interactive graph that 
plots the selected metrics (listed as columns in the metrics 
data, i.e. second part of the results below) on the fly 
(Figure 2). Once a job starts to run in the compute 
cluster and log file is in place, the interactive graph is 
produced and updated as the job progresses and structures 
can be downloaded from the trajectories. The default 
graph plots total energy of the system over the trajectory 
for all the CPUs, but the user can change it at any moment 
(Figure 3A). For instance, it can be changed to plot ligand 
binding energy versus the RMSD from the native struc- 
ture (Figure 3B). 

The second part of the results is a table of all the 
selected metrics separated by the CPU number. Here, 
based on the information presented in the plot (e.g. struc- 
ture with the lowest binding energy), user can select one or 
more structures from the trajectories and download them 
as a PDB file for further modelling purposes. The whole 
trajectory file (or files in case of having more than one 
processor) containing all the sampled conformations can 
be downloaded from the files section, which is the third 
and last part of the results. This file can be used as an 
input for interactive viewers, e.g. VMD (http://www.ks. 
uiuc.edu/Research/vmd/), or PyMol (http://www.pymol. 
org) to animate ligand/protein motions. In the Examples 
section of the web server, there are several animations 
prepared in this way. 

Implementation 

The PELE web server has been implemented in C++ using 
Wt (http://www.webtoolkit.eu), a C++ library for de- 
veloping web applications and MySQL (http://www. 
mysql.com) as back-end database. The original PELE 
code has been implemented in Fortran; nevertheless, 
reimplementation of the old code as a C++ library is a 
work in progress. Jobs submitted by the web server are 
transferred to a dedicated compute cluster consisting of 
144 cores (AMD Opteron 2.4 GHz) at the time of this 
writing and will be queued based on their requested 
wall clock and number of CPUs. In this way, several 
submitted jobs can be run in parallel based on the avail- 
able CPUs. 

Community help as public forums, bug tracking and 
feature request are available through PELEs Redmine 
site (http://pele.bsc.es/redmine), which has been imple- 
mented using Redmine (http://www.redmine.org), a free 
and open source, web-based project management and 
bug-tracking system. 

Aspirin binding to phospholipase A2 as a ligand migration 
test case 

As a simple test case for ligand migration, we consider the 
non-biased aspirin binding to phospholipase A2. This test 
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Figure 2. Screenshot of the results obtained after 1 h, using ready-made 
script #4 (unconstrained ligand exploration and binding site search) 
exploiting 12 CPUs, for aspirin binding to phospholipase A2 test 
case (PDB ID: lOXR). The four steps drawn in red demonstrate 
how the interactive graph can be used to recognize the processor that 
found the lowest RMSD from the native structure (CPU#10), and sub- 
sequently, how to use that information for selecting and downloading 
the corresponding structure (Step 63) from the trajectory. 



case was first showed by Mike Kuiper, currently a com- 
putational scientist at the Victorian Life Science 
Computation Initiative, at the 2011 Annual Structure- 
Based Drug Design conference in Boston. By running 
MD on 256 cores in an IBM Blue Gene Supercomputer 
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Figure 3. Two samples of what can be obtained using the interactive 
graph for a submitted job. (A) Graph showing the total energy of the 
system for all the CPUs over the trajectory, obtained for the aspirin 
binding to phospholipase A2 test case (PDB ID: lOXR) after 1 h. (B) 
Graph showing the scatter plot of the ligand binding energy versus the 
RMSD from the native structure for the same system as (A). As can be 
seen, the structure with the lowest RMSD from the native found by the 
CPU number 10, also corresponds to the lowest binding energy. 



for several days, he could observe spontaneous binding in 
the active site, reproducing within 2 A the native crystal 
structure (PDB ID: lOXR) (26). As starting system he 
used four ionized aspirins all placed randomly away 
from the protein. Although in terms of size and ligand 
flexibility this is an easy system, similar results have been 
obtained using MD for more complex systems (9). 

Starting with a structure where aspirin (only one copy 
of the ligand) is located in the bulk solvent, with a 27 A 
ligand RMSD to the crystal bound structure, PELE is 
capable of reproducing the bound crystal structure in 
less than a 2 h x 8 processors run. Each processor 
performs an independent search where it combines large 
and small translations of the ligand. If a native structure is 
provided, information of the bound structure is only used 
to compute the RMSD for comparison purpose. 
Figure 3A shows the evolution of the energy along the 
simulation (with an initial drop due to the lack of a 
previous minimization of the crystal). Figure 3B shows 



the binding energy against the ligand RMSD to the 
bound crystal structure. Binding energies are obtained 
by computing the protein-ligand interaction energy for 
each local minima produced by PELE. Notably, the simu- 
lation is capable of reproducing the bound structure and, 
more importantly, of identifying it based on the protein- 
ligand interaction energy. The binding mechanism can be 
reproduced, for example, from the compilation of differ- 
ent local minima. In the Example section of the server, 
there is a movie of the binding event for this test case, 
where the presence of explicit water molecules and a 
calcium ion in the active site can be observed. 

Thus, remarkably, a free search of the binding process 
from solution can be accomplished within an all atom 
model in only few hours of CPU. More difficult test 
cases, such as the binding in the Src kinase recently per- 
formed by the D. E. Shaw Research group (9), would 
require only 10-20 CPU hours on 24 processors. 
Certainly this constitutes a breakthrough when 
compared with MD studies, allowing for screening of 
several compounds in a timely manner. This test case 
has been made as the default 'Sample PDB' for the 
'Unconstrained ligand exploration and binding search' 
ready-made script (14,15). 

Nuclear hormone receptors as a ligand refinement 
test case 

Another ready-made script, which we believe of large 
utility, is the 'Ligand binding refinement'. This application 
allows the refinement of initial docking poses where we 
expect significant protein and ligand reorganization 
(induced fit). The script will produce small ligand transla- 
tions coupled to large rotations within a given radius from 
the active site, ligand RMSD change, etc. The script might 
be also useful, for example, in refining the structure after a 
free ligand migration (the first test case above) or after 
driving the ligand from outside to the active site. These 
more broad searches might use larger translation and/or 
steering towards the active site; they might not include 
the fine sampling required to refine the structure within 
the active site cavity. As an example, we include here the 
refining of a benzoxazin derivative from the 
mineralcorticoid ligand-binding domain receptor (PDB 
ID: 3VHV) (27). After placing the ligand outside and 
running a free search, the ligand reached a structure 
with a 7 A ligand RMSD from the bound crystal 
(Figure 4A). Within 2 h x 16 o processors using the 
refining script, we reached a 0.9 A structure (Figure 4B). 
Furthermore, the bound structure is clearly identified by 
the binding energy (Figure 4C). This test is currently the 
'Sample PDB' for the 'Ligand binding refinement' ready- 
made script (14,15). 

We should emphasize here that the ready-made scripts 
could benefit (decreasing the simulation time significantly) 
from a system-specific parameter tuning. This is particu- 
larly true for the ligand refinement case, where 
reproducing large (or rare) induced fit events might be 
hard (in agreement with the difficulty in accurately 
evaluating the binding free energy). By system-specific 
parameters, we refer to possible promoting modes, 
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Ligand RMSD 

Figure 4. Results obtained from the active site refinement test case. (A) 
Initial structure after ligand dynamics with a 7 A ligand RMSD from 
the bound crystal (PDB ID: 3VHV). (B) Refined structure produced by 
PELE with 0.9 A ligand RMSD. (C) Binding energy against the ligand 
RMSD along the refinement simulation. Ligand RMSDs are computed 
after alpha-carbon proteins alignment. 



ligand partial charges and rotamers, protein protonation 
states, etc. Many tips are provided in the Examples section 
of the web server. 



CONCLUSIONS AND FUTURE DEVELOPMENT 

The motivation behind the PELE web server was to 
provide, to the large community of scientists especially 
those who work on molecular target therapies, a fast 
and accurate tool capable of obtaining an atomic 
detailed mechanism of the protein-ligand induced fit, of 
its recognition process and of the ligand migration. 
Understanding these aspects is essential for the accurate 
prediction of protein-ligand interactions and plays an im- 
portant role in rational drug design, including, but not 
limited to, bypassing drug resistance induced by protein 
mutations. 

PELE, our novel technology based on protein structure 
prediction algorithms and a Monte Carlo sampling, is 
capable of describing the all-atom dynamical interactions 
between proteins and ligands with two orders of magni- 
tude reduced computational cost. As PELE is a versatile 
and complex program that supports numerous directives 
and options in the control script, this web server makes 
the whole process of running PELE simulations easier and 
more practical by minimizing input file demand, providing 
user-friendly interface and producing abstract outputs 
(e.g. interactive graphs and tables). We are confident 
that PELE web server will be of high interest and practical 
utility for a wide range of scientists working in structural 
bioinformatics, computational chemistry and biology, mo- 
lecular biophysics, rational drug design and related fields. 

The PELE web server is an on-going project, and our 
intention is to improve it constantly. The following are the 
highlights of the most important features considered for 
the future versions: 

• Visualization of the results using a WebGL (http:// 
www.khronos.org/webgl/)-based molecular viewer. 

• Interactive addition of hydrogen atoms and assign- 
ment of protonation state using the molecular viewer. 

• Fixing missing residues and side chains. 

• Support for the nucleic acids, i.e. DNA/RNA. 

• CUDA/OpenCL implementation of the PELE code 
for GPUs. 
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